Performance Inconsistency in Large Scale Data Processing Clusters

نویسندگان

Mingyuan Xia

Nan Zhu

Sameh Elnikety

Xue Liu

Yuxiong He

چکیده

A large shared computing platform is usually divided into several virtual clusters of fixed sizes, and each virtual cluster is used by a team. A cluster scheduler dynamically allocates physical servers to the virtual clusters depending on their sizes and current job demands. In this paper, we show that current cluster schedulers, which optimize for instantaneous fairness, cause performance inconsistency among the virtual clusters: Virtual clusters with similar loads see very different performance characteristics. We identify this problem by studying a production trace obtained from a large cluster and performing a simulation study. Our results demonstrate that when using an instantaneous-fairness scheduler, a large VC that contributes more resources during underload periods can not be properly rewarded during its overload periods. These results suggest that not using resource sharing history is the root cause for the performance inconsistency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using data envelopment analysis (DEA) to improve the sales performance in Iranian agricultural clusters by utilizing business networks and business development services providers (BDSPs)

Business clusters play an important role in developing and improving the economic performance of countries and in promoting the welfare of people. Business development service providers (hereafter referred to as, BDSP) have a considerable role in providing specialized services pertinent to the conditions of active enterprises in clusters and in promoting their performance level in order to impr...

متن کامل

Determination of Cluster Hydrodynamics in Bubbling Fluidized Beds by the EMMS Approach

The local solid flow structure of gas-solid bubbling fluidized bed was investigated to identify and characterize the particle clusters. Extensive mathematical calculations were carried out using the energy-minimization multi-scale (EMMS) approach for evaluating cluster properties including the velocity, the size and the void fraction of clusters in the dense phase of the bed. The results showed...

متن کامل

خوشه‌بندی داده‌ها بر پایه شناسایی کلید

Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...

متن کامل

Data-Replicas Scheduler for Heterogeneous MapReduce Cluster

Large scale data processing has rapidly increased in nowadays. MapReduce programming model, which is firstly mentioned in functional languages, appeared in distributed system and perform excellently in large scale data processing since 2006. Hadoop, which is the most popular framework of open-sourced MapReduce runtime environment, supplies reliable, scalable and distributed system processing la...

متن کامل

Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale – Public Review

The huge volume of data available today has led to interest in parallel processing on commodity clusters. Data analytics distributed frameworks such as Hadoop, Spark, or Pregel are designed for parallel processing of a large amount of data. These frameworks break a computation job into small tasks that run in parallel on multiple machines, and aim to scale to very large clusters of inexpensive ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Performance Inconsistency in Large Scale Data Processing Clusters

نویسندگان

چکیده

منابع مشابه

Using data envelopment analysis (DEA) to improve the sales performance in Iranian agricultural clusters by utilizing business networks and business development services providers (BDSPs)

Determination of Cluster Hydrodynamics in Bubbling Fluidized Beds by the EMMS Approach

خوشه‌بندی داده‌ها بر پایه شناسایی کلید

Data-Replicas Scheduler for Heterogeneous MapReduce Cluster

Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale – Public Review

عنوان ژورنال:

اشتراک گذاری